58 research outputs found
A Note on "Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms"
Data valuation is a growing research field that studies the influence of
individual data points for machine learning (ML) models. Data Shapley, inspired
by cooperative game theory and economics, is an effective method for data
valuation. However, it is well-known that the Shapley value (SV) can be
computationally expensive. Fortunately, Jia et al. (2019) showed that for
K-Nearest Neighbors (KNN) models, the computation of Data Shapley is
surprisingly simple and efficient.
In this note, we revisit the work of Jia et al. (2019) and propose a more
natural and interpretable utility function that better reflects the performance
of KNN models. We derive the corresponding calculation procedure for the Data
Shapley of KNN classifiers/regressors with the new utility functions. Our new
approach, dubbed soft-label KNN-SV, achieves the same time complexity as the
original method. We further provide an efficient approximation algorithm for
soft-label KNN-SV based on locality sensitive hashing (LSH). Our experimental
results demonstrate that Soft-label KNN-SV outperforms the original method on
most datasets in the task of mislabeled data detection, making it a better
baseline for future work on data valuation
PresenceSense: Zero-training Algorithm for Individual Presence Detection based on Power Monitoring
Non-intrusive presence detection of individuals in commercial buildings is
much easier to implement than intrusive methods such as passive infrared,
acoustic sensors, and camera. Individual power consumption, while providing
useful feedback and motivation for energy saving, can be used as a valuable
source for presence detection. We conduct pilot experiments in an office
setting to collect individual presence data by ultrasonic sensors, acceleration
sensors, and WiFi access points, in addition to the individual power monitoring
data. PresenceSense (PS), a semi-supervised learning algorithm based on power
measurement that trains itself with only unlabeled data, is proposed, analyzed
and evaluated in the study. Without any labeling efforts, which are usually
tedious and time consuming, PresenceSense outperforms popular models whose
parameters are optimized over a large training set. The results are interpreted
and potential applications of PresenceSense on other data sources are
discussed. The significance of this study attaches to space security, occupancy
behavior modeling, and energy saving of plug loads.Comment: BuildSys 201
Environmental Sensing by Wearable Device for Indoor Activity and Location Estimation
We present results from a set of experiments in this pilot study to
investigate the causal influence of user activity on various environmental
parameters monitored by occupant carried multi-purpose sensors. Hypotheses with
respect to each type of measurements are verified, including temperature,
humidity, and light level collected during eight typical activities: sitting in
lab / cubicle, indoor walking / running, resting after physical activity,
climbing stairs, taking elevators, and outdoor walking. Our main contribution
is the development of features for activity and location recognition based on
environmental measurements, which exploit location- and activity-specific
characteristics and capture the trends resulted from the underlying
physiological process. The features are statistically shown to have good
separability and are also information-rich. Fusing environmental sensing
together with acceleration is shown to achieve classification accuracy as high
as 99.13%. For building applications, this study motivates a sensor fusion
paradigm for learning individualized activity, location, and environmental
preferences for energy management and user comfort.Comment: submitted to the 40th Annual Conference of the IEEE Industrial
Electronics Society (IECON
- …